{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Split an iterable into equal sized chunks.\n",
    "A common task and interview question, with many variants.  It's frequently [asked and answered](http://stackoverflow.com/questions/312443/) in a way that's suboptimal and only handles [one specific case](http://stackoverflow.com/questions/6822725/).  The goal here is to present definitive, general, and efficient solutions.\n",
    "\n",
    "The first variant is whether or not the chunks will overlap.  Although this could be generalized into a `step` parameter, it's nearly always the case that `step in (1, size)`.\n",
    "\n",
    "The second variant is whether the tail of the data should be returned, if it's not of equal size.  Clearly when `step == 1` that's unlikely to be desirable.  However, when `step == size`, that would seem to be the natural choice.  And again when `1 < step < size`, the desired behavior isn't clear at all.\n",
    "\n",
    "The third variant is whether to slice sequences, or support any iterable.  Obviously working for any iterable would be ideal, but it's also likely a user would expect slices given a sequence, particularly in the case of strings.\n",
    "\n",
    "So this author feels it's best to split the problem in 2 distinct cases:  a sliding `window` for overlapping sequences, and `chunks` for discrete sequences.  In each case, supporting iterables and using advanced iterator algebra for a minimal and efficient solution.\n",
    "\n",
    "## Window"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": "[('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e')]"
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import itertools\n",
    "\n",
    "def window(iterable, size=2):\n",
    "    \"\"\"Generate a sliding window of values.\"\"\"\n",
    "    its = itertools.tee(iterable, size)\n",
    "    return zip(*(itertools.islice(it, index, None) for index, it in enumerate(its)))\n",
    "\n",
    "list(window('abcde'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "That's simple, and close to optimal.  There is slight overhead in iterating an `islice` object, so a minor variant would be to force the step-wise iteration in advance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": "[('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e')]"
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import collections\n",
    "\n",
    "def window(iterable, size=2):\n",
    "    \"\"\"Generate a sliding window of values.\"\"\"\n",
    "    its = itertools.tee(iterable, size)\n",
    "    for index, it in enumerate(its):\n",
    "        collections.deque(itertools.islice(it, index), 0)  # exhaust iterator\n",
    "    return zip(*its)\n",
    "\n",
    "list(window('abcde'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Chunks\n",
    "A lesser-known and under-utilized feature of `iter` is that in can take a callable (of no arguments) and a sentinel to create an iterator.  A perfect use case of the \"loop and a half\" idiom."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": "[('a', 'b', 'c'), ('d', 'e')]"
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "def chunks(iterable, size):\n",
    "    \"\"\"Generate adjacent chunks of data\"\"\"\n",
    "    it = iter(iterable)\n",
    "    return iter(lambda: tuple(itertools.islice(it, size)), ())\n",
    "\n",
    "list(chunks('abcde', 3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This should also be optimal, for reasonable sizes.\n",
    "\n",
    "## Sequences with dispatch\n",
    "Rather than explicitly check `isinstance`, this is a perfect use case for `functools.singledispatch`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": "['ab', 'bc', 'cd', 'de']"
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import functools\n",
    "from collections.abc import Sequence\n",
    "\n",
    "window = functools.singledispatch(window)\n",
    "\n",
    "@window.register\n",
    "def _(seq: Sequence, size=2):\n",
    "    for index in range(len(seq) - size + 1):\n",
    "        yield seq[index:index + size]\n",
    "\n",
    "list(window('abcde'))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": "[('a', 'b'), ('b', 'c'), ('c', 'd'), ('d', 'e')]"
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list(window(iter('abcde')))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": "['abc', 'de']"
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "chunks = functools.singledispatch(chunks)\n",
    "\n",
    "@chunks.register\n",
    "def _(seq: Sequence, size):\n",
    "    for index in range(0, len(seq), size):\n",
    "        yield seq[index:index + size]\n",
    "\n",
    "list(chunks('abcde', 3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": "[('a', 'b', 'c'), ('d', 'e')]"
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "list(chunks(iter('abcde'), 3))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "env": {},
   "interrupt_mode": "signal",
   "language": "python",
   "metadata": {},
   "name": "python3"
  },
  "nikola": {
   "category": "interviews",
   "date": "2019-12-28",
   "description": "",
   "link": "",
   "slug": "split-an-iterable",
   "tags": "",
   "title": "Split an Iterable",
   "type": "text"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}